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Abstract — Image registration is an important tool for medical 
image analysis and is used to bring images into the same reference 
frame by warping the coordinate field of one image, such that 
some similarity measure is minimized. We study similarity in 
image registration in the context of Locally Orderless Images 
(LOI), which is the natural way to study density estimates and 
reveals the 3 fundamental scales: the measurement scale, the 
intensity scale, and the integration scale. 

This paper has three main contributions: Firstly, we rephrase 
a large set of popular similarity measures into a common 
framework, which we refer to as Locally Orderless Registration, 
and which makes full use of the features of local histograms. 
Secondly, we extend the theoretical understanding of the local 
histograms. Thirdly, we use our framework to compare two state- 
of-the-art intensity density estimators for image registration: The 
Parzen Window (PW) and the Generalized Partial Volume (GPV), 
and we demonstrate their differences on a popular similarity 
measure, Normalized Mutual Information (NMI). 

We conclude, that complicated similarity measures such as 
NMI may be evaluated almost as fast as simple measures such 
as Sum of Squared Distances (SSD) regardless of the choice of 
PW and GPV. Also, GPV is an asymmetric measure, and PW is 
our preferred choice. 

Index Terms — Similarity measure, registration, normalized 
mutual information, density estimation, scale space, locally or- 
derless images. 



I. Introduction 

IMAGE similarity measures are crucial components in 
image registration, and Mutual information (MI) (TJ, (2) 
and Normalized Mutual Information (NMI) (3) are consid- 
ered state-of-the-art for image registration. MI and NMI are 
particularly useful for registering Magnetic Resonance Images 
(MRI) to MRI as well as for multi-modal image registration in 
general. MI and NMI are entropy based measures and hence 
rely on probability distributions. Probability distributions are 
most often approximated by discrete histograms, which poses 
a challenge to gradient based optimization schemes. The 
most common estimation techniques are: The Parzen Window 
(PW) [2] and the Generalized Partial Volume (GPV) (4), (5). 
Empirical comparisons have previously been presented (6), 
and recently we investigated their theoretical connection (7J. 

In this paper, we present Locally Orderless Registration 
(LOR). LOR is a framework for performing registration of TV- 
dimensional images, and it includes a common framework for 
a wide range of image similarity measures such as Correlation 
Ratio, MI NMI, Huber Norm etc.. The framework is centered 
around local histograms, and we use Locally Orderless Images 
(LOI) (8), (9), which makes the 3 natural scale parameters 
available for image registration: measurement scale - smooth- 
ing of the initial image, intensity scale - smoothing of the 
histogram, and integration scale - the local spatial extend 
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of local histograms. These 3 scales interact in a nontrivial 
manner, and we explore their relation theoretically by the 
local intensity moments as well as on a simple local image 
model. We perform extensive empirical investigations on the 
influence of the scales on the density estimates as well as 
NMI. We also summarize and extend our earlier theoretical 
work [7], where LOR is used to compare PW and GPV, and 
we demonstrate both theoretically as well as empirically that 
GPV is asymmetric, and therefore the less preferred choice of 
the two. Finally, we present a unifying algorithm for PW and 
GPV for various measures, and we present analytical as well 
as empirical investigations of its computational complexity. 
Timing results on our algorithm shows that NMI is almost 
as fast as Sum of Squared Differences (SSD), and that multi- 
threaded implementation only has 13% overhead compared to 
the theoretical computational speed. 

A. Previous work 

The use of Mutual Information for image registration was 
originally proposed by (T), (2). An extensive overview was 
given in (TO) . Normalized Mutual Information was introduced 
as a more robust alternative especially designed for multi- 
modal image registration (3). The first implementations re- 
lied on Powell's method E), hill climbing [3], or similar 
methods without gradients, which were accurate but slow. 
A GPU speed-up was suggested in (TTJ. Today, state-of-the- 
art implementations are gradient based methods and group 
in two algorithm types: The first type is based on PWs 
|2] and relies on the fact that the marginal and the joint 
histograms are made continuous by using different kernels, 
e.g., Gaussian or B-splines (12). The second type is based on 
GPV, where the distribution is sampled from the image directly 
|4|. Analytical derivatives of this method were presented in 
|13| and a generalization using B-splines was presented in 

fA variational method relating to Locally orderless images 
for MI (and other measures) was presented in (14). GPV 
and PW was compared numerically in [6] concluding that 
PW is precise and GPV has a larger convergence radius. MI 
and NMI are notorious for their local minima and difficulty 
of implementation, and the choice of interpolation scheme 
greatly influences the smoothness of the objective function. 
Some investigations into this can be found in (15), (16). An 
alternative approach is the Conditional Mutual Information 
(17). In (7) we investigated PW and GPV for NMI, using 
differential calculus in a thorough step-by-step presentation. 
The derivations was an alternative to the variational approach 
in p4| , and our approach revealed much faster algorithms 
and a direct comparison between PW and GVP. (T4| allowed 
for a local variant of MI, which was implemented in [18]. 
Furthermore, a density alternative though computational com- 
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plex estimation scheme was suggested in [19 ] but is however 
unsuited for fast gradient based optimization schemes. 

The remainder of this article is organized as follows: In 



Section 
Section 



IN the general registration framework is described. In 
III] we revisit Locally orderless images as a basis for 



analysing GPV and PW as well as discuss relation between 



scales for local histograms. In Section [IV] we provide a 
theoretical comparison between GPV and PW, and in Sec- 
tion [V] we augment the theoretical comparison with empirical 
demonstrations of the asymmetry. In Section [VI| we discuss 
empirical relations between scales. In Section [VIl| we present 
a fast algorithm for computing PW and GPV for a large range 
of similarity measures, and in Section VIII we summarize our 
findings and conclude on our work. 

II. Image registration 

Image registration is the process of transforming one image 
/ : Q -> T, where Q C R N and T C R, w.r.t. a reference 
image R : ft — » T such that some functional F{I^R) is 
minimized. We consider diffeomorphic transformation of M 
parameters, <fi : Q x R M — Q, and for brevity we write 
I = I o (p. We consider functionals, J 7 , of the form, 



F = M(I,R)+S(4>), 



(1) 



where M is a (dis-)similarity measure and S is a regularization 
term. Typical forms of S is elasticity (20), fluid |2T| or the 
recent Kernel Bundle LDDMM [22]. Our focus is solely on 
M. 

A. The Similarity Mearsure 

Many similarity measures are on the form of, 

Mn= f F(x,I(x),R(x))dx, (2) 
Jn 

where we in this article make the distinction between differ- 
entials as the element wise differentials and the hyper volume 
elements dx = dx\ A • • • A dxw used here for integration. 
A popular choice of the loss-function, F, are monomials, 
F(I(x),R((x))) = (I(x)-R(x)) q for<? > 0. Other similarity 
measures have the form of, 



Mi 



r 2 



F(x,i,j,h IiR (i,j))diAdj, 



(3) 



where hi^R : T 2 — >> R + is the joint histogram of image I 
and R with intensity variables i and j. A popular choice is 
mutual information [23], Mmi = Hi + T-Lr — Hi,r, where T-L 
is the (joint) entropy, in which case F = p(i, j) lnp(i, j) — 
p(i) \np(i) —p(j) lnp(j). The natural logarithm is often used 
for convenience, and the distribution p is the normalized (joint) 
histogram p(i,j) = h(i,j)/ J F2 h(i,j)diAdj, such that p(i) — 
f r p(hj)dj, and p(j) = f r p(ij)di. 

A seemingly major difference between ^ and ^ is the 
integration domain. However, we will show that by reordering 
the integral by the distribution of / and R values, we may 
rewrite ^ in terms of local histograms h(x,i,j). This has 
several advantages: 1) It creates a common form for both 
classes of similarity measures. 2) The histogram perspective 



reveals the 3 fundamental scales of images: intensity, measure, 
and integration. 3) The loss-function F for q-norms and similar 
becomes linear w.r.t. the transformation parameters. 4) With 
the use of smooth kernels, the derivatives w.r.t. space as well as 
intensity are trivial, and thus are readily available for gradient 
descent schemes. There is, however, a minor disadvantage: 
Continuous histograms have poles, corresponding to image 
values, where the spatial gradient of the image is zero. In 
practice this is of little importance, since we consider generic 
images, i.e., images whose structure is stable w.r.t. negligible 
noise, and for such images, the set of areas with zero gradients 
are singular points with measure zero. We will assume that the 
poles in the histograms likewise have zero measure, which 
is supported by our observations, but which we leave to be 
proven in later work. 

A wide range of loss-functions, F, linear as well as non- 
linear can be formulated in our common framework. We 
refer to this framework as the Locally Orderless Registration 
framework (LOI registration), and it has the form, 



M 



Qxr 2 



F (x, i, j, hi iR (x, z, j)) dx A di A dj. (4) 



Most functionals in the literature are positional independent, 
wherefore we will adopt the same focus, and further we denote 
the remainder as either M\m or A^niin- The similarity measure, 
.Miin, uses a position independent, linear loss-functions, 

Mm= F(i,j)h IyR (iJ)diAdj. (5) 

This measure includes ^ with any positional independent 
loss-function F such as monomials, it is linear in h w.r.t. F 
and h, and the transformation parameters only influences h. 
The similarity measure, jM n iin» uses a position independent, 
non-linear loss-functions, 



A^niin = / F(h IjR (iJ)) di A dj, 
Jr 2 



(6) 



where F now denotes some non-linear functional, and this 
form includes Mutual Information. As will be shown later, 
the added complexity from linear to non-linear measures has 
little influence on computation time. 

Position independent, linear loss-functions, Mm, are all 
linear in h w.r.t. transformation parameters. Examples are, 
q > 0: 



F iq (iJ) = \i-j\ 



(7) 



r r m \ j y\i-3\-k) q if \i~j\ > fc, 
1 otherwise, 




qk q ~ x (i — j) — (q — l)k q otherwise, 
i-j\ q if|i-j|<fc, 



k q 



otherwise. 



if \i-j\ < k, 

(9) 
(10) 



Position independent, non-linear loss-functions, A"f n iin, in- 
clude Mim as well as mutual information (MI), normalized 
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mutual information (NMI), and Correlation Coefficient (CC), 

(11) 
(12) 



Mm = Hi + H R - Hi,r, 
Hj + Hr 



r 2 °i°r 



Pi,RdiAdj, (13) 



where /ij = J r2 ipi, R (iJ) di A dj, aj = J r2 (i - 
Vi) 2 pi,R(hj) diAdj, and similarly for /jlr and ctr, and where 
T-L denotes the marginal and the joint entropy of the intensity 
distribution (23), 

Hi = - J pi(i)hipi(i)di, (14) 

U R = - J p R (j) In p R (j) dj, (15) 

Ki,r = - / pi,R(i,j)lnp IiR (i,j)diAdj, (16) 
Jr 2 

The distributions are obtained by normalizing the histograms 
to unity, 



p{i) 



P(i,j) 



h(i) 



J r2 h(k, I) dk A dl 



(17) 
(18) 



Position dependent, non-linear loss-functions, A4, include 
A^niin as well as Correlation Ratio (CR). Correlation Ratio 
for image registration was proposed in [24], but originates 
from analysis of variance and is based on the factorization of 
the variance into variance within classes and between class 
averages (25). E.g., consider an image /, segmented into 
regions denoted by R such that region Qj = {x : R(x) = j}, 
and assuming uniform spatial distributions. Then correlation 
ratio is defined as, 



where 



Q\ a 2 ^ mi <j- 

3 3 



(i) di, 



(19) 



(20) 



Q\/jl = / I(x)dx= / ih( 
Jq Jr 

1\(J 2 = [ I(x) 2 -^ 2 dx= I \i 2 - v 2 )h(i)di. (21) 
Jq Jr 



and 



\n 



jlfij = / I(x)dx= / ihj(i)di, (22) 

|^|cr| = / /(ic) 2 - /i 2 (ix = / (i 2 - /i 2 )^(i) dv. (23) 

for the corresponding local histograms hj of intensity values 
in / of segment Qj. 

In this paper we will consider Normalized Mutual Infor- 
mation (NMI) [3], which has proven to be very powerful for 
registration of medical images in general. 



III. Density estimation 

A common algorithm for estimating the histogram of an 
image is counting: Given an image I(x) = i, a set of bin- 
widths and sample points, Ai n > and i n , m > n => i m > 
in , and an indicator function, 



1, if i n < i < i r , 
0, otherwise. 



Pn(i) 

Then the histogram may be found as, 



h(n) 



P n (I(x))dx, 



(24) 



(25) 



or as a sum using a suitable discretization of Q. The bin- widths 
act as scale parameters in the sense that when increasing Ai n , 
then the histogram will contain less detail about the intensity 
distribution. This can be stated precisely: Select a discrete set 
of sample points and bin- widths such that Ai n = i n +i — in, 
and consider 2 neighboring histogram values, h(n) and h(n + 
1). In this case, the sum, h'{n) = ft(n)+ft(n+l), is equivalent 
to evaluate the integral with a modified indicator function 



1, if i n < i < i n +i + Az n+ i 
0, otherwise, 



AiL 



(26) 

where Ai' n = Ai n + Az n+ i. By induction it becomes clear 
that filtering h(n) with a Boxcar function (0-order b-spline) 
of height 1 and width m is equivalent to increasing the extend 
of the indicator function as Ai' n = Y^!k=o Ai n +h- Thus, 
increasing Ai is equivalent to smoothing the histogram with 
a Boxcar function. 

In general, the interesting scales of i are not given by the 
data, and therefore the only option is to study all scales, that 
is, all discretizations of intensity. Along with the scale- space 
on the spatial parameter x, this leads to a scale-space theory 
for space and intensity known as Imprecision Space |8|. In the 
general case, histograms are local. Since the scale of the region 
of interest is not generally given, then we are also required to 
study all scales. This scale we denote the integration scale, and 
the combined construction is called Locally Orderless Images 
(LOD (9). 

A. Estimating local histograms 

According to Locally Orderless Images, a local histogram 
is obtained as follows: First a (possibly deformed) image / 
is smoothed with the kernel K, a soft isophote i is extracted 
using kernel P, and finally the isophote mass is calculated in 
a neighborhood of a point x with kernel W. Formally, 

ftj(i, x, a, /3, a) = P(I(x, a) - i, 0) * W(x, a), (27) 
cr) = I(x) * K(x, cr), (28) 

where P : R x R + — » [0, 1] is an intensity measurement 
of scale fj and is often called the Parzen Window (PW), 
K : M N x R + — R + is a spatial measurement kernel of 
scale a, W : R^ x R + — >> R + is an integration window 
of integration scale a, • * • is the convolution operator taken 
w.r.t. the variable x, and <l> G R M denotes the param- 
eters for the transformation. The histogram Jir is defined 
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Histogram of Original Image 





Histogram of Smooted Image Matlab Histogram (bin width=0.0016 

150 



0.46 0.48 0.5 0.52 0.54 



(a) (b) 
Fig. 1. (a): A random image and (b): its histograms. 



Smoothed Image (sigma=4) 



Histogram of Smooted Image 



40 50 60 




(a) 



Fig. 2. (a): The image in Fig. |l(a)| smoothed with a = 4 and (b): its 
histogram. 



similarly independently of <I>. In (9| it is proposed to use 
P(i,/3) = e -* 2 /^ 2 ), K(x,a) = e— T -/( 2 - 2 )/(2^a 2 ) iV / 2 , 
and W(x,a) = e -* T *l&°?) /(2na 2 ) N / 2 , which implies the 
structure of the heat diffusion in all 3 scale parameters and 
is considered the simplest structure imposable for studying 
data by all scales. In typical registration scenarios, such as 
registering CT and MR images, intensity and spatial scale 
are of quite different nature. The spatial scales can often be 
related to a common frame of references, but this is not as easy 
for intensity scales, which indicates that information measures 
may be preferable for multimodal registration. 

To give intuition we will discuss the calculation of the local 
histograms in a step by step manner including the meaning 
of the various scale parameters. Consider a random image 
and its histogram as calculated by the Matlab hist function, 
shown in Fig. [T] In terms of local histogram parameters, this 
corresponds to: a = oo, a = 0, and f3 = \j\jYl, the standard 
deviation of a Boxcar function of width 1. 

a) Image smoothing with K: First step in calculating a 
local histogram is to smooth the image with kernel K. The 
kernel K controls the image scale, a. This is illustrated in 
Fig. [2] and corresponds to a = 00, a > 0, and f3 = Ai/y/l2 9 
where Ai is the original intensity scale. Since smoothing an 
image implies a monotonic contraction of image intensity 
around the mean value, we expect that the histogram is 
likewise contracted, when increasing a. This is confirmed by 



the experiment illustrated in Fig. 2(b) 



b) The Parzen window, P: Second step is to calculate the 
soft isophote i with kernel P: The kernel P controls intensity 
scale, f3. This is illustrated in Fi g. [3] a nd corresponds to a = 0, 
a > 0, and f3 > 0. Fig. 3(b) and 3(c) show the spread of 2 fixed 
isophotes for the chosen P. For a fixed position x the image 
contains the value of the local histogram at x. Hence, the stack 





0.46 0.48 0.5 0.52 0.54 

(a) 



0.46 0.48 0.5 0.52 0.54 

(b) 



Fig. 4. Measuring histograms of Fig. [2] (a): A local histogram using a = oo 
and f3 = 0.005, and (b): a histogram using Matlab's hist function. 



of images for all isophotes gives all the local histograms. The 
spread of a soft isophote depends on the image geometry at 
I(x,a) — i: The spread will be large, where the gradient 
magnitude is small, and small, where the gradient magnitude 
is large. In general the width f3 acts as the bin- width in the 
histogram, and varying f3 corresponds to varying the degree 
of smoothing of the histogram, which is illustrated in Fig. [4] 
c) Locality, W: Last step is to calculate the local 
isophote area near x with kernel W: The kernel W controls 
the locality of the local histogram, a, illustrated in Fig. [5] 
Note that the histograms change quite significantly with the 
position of the kernel W. 

B. Some relations between scales 

In general, varying j3 and varying a yields different results, 
since the width of a soft isophote in a point is proportional 
to the gradient in the point, while the extend of local average 
is irrespective of the gradient in the point. In addition, near 
the symmetry set (26), the soft isophote will have a ridge like 
behavior. 

The relation between a and a may be stated in terms of 
the histogram raw moments and central moments. The raw 
moment and central moments of order n > of the histogram 
h at position x are given as, 



H n = / i n h(i,x)/k(x) di 

J — oo 

/CO 
(i — n(x)) n h(i, x)/k(x) di 
-oo 



(29) 
(30) 



where k(x) = j^h^^x) di and fi(x) = fi[ is the mean 
value. In the following, we will evaluate these moments. 
• Normalization constant k: Convolution is linear, thus 
k(x) — ( P(L(x) — i) di) * W(x). Since the image 
value L(x) under the integral acts as a translation of the 
Parzen window, and since the integral is over the entire 
domain, we conclude that the integral is independent on 
x, and thus 



k 



P(i) di 



(31) 



independent on x. In case of a Gaussian Parzen window 
of variance f3 2 , then /foams = PV^tt- 
Mean value fi: If the Parzen window, P, has zero mean 
value, then the mean value of P(L(x) — i) is L(x), and 
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Local Histogram (x=(16,16), alpha=8) 




(c) (d) 

Local Histogram (x=(48,48), alpha=8) 




(e) (f) 

Fig. 5. Examples of local histograms generated by Locally orderless images 
in neighbourhoods as indicated by the red overlays. 

thus 

/oo 
ih(i, x) jk di = k^Lix) * W(x) (32) 
-oo 

= k~ 1 I(x) * K(x) * W(x) (33) 
= k~ 1 I(x)*W\x), (34) 



where W (x) — K(x) * W(x). In case of Gaussian K 
of variance a 2 and W of variance a 2 , then W'{x) is 
Gaussian of variance a 2 + a 2 . 
• Raw moments \j! n : The raw moments of h may be 
constructed from the moments of P/k. Writing the raw 
and central moments of P/k as r)' n and rj n , we have that 




The n'th central moments for a Gaussian of variance f3 2 is 
(n — l)!!/3 n for even n and otherwise, where (n — 1)!! is 
the double factorial function. Thus, the n'th raw moments 
of h is ii' n = r}' n * W(a?), which is linear combination of 
terms L(x) n ~i * W(a?), j = . . . n, and the relation 
between a and a is non-linear for most cases of n and 
j. For Gaussian K and this is a pseudo-linear scale- 
space [27]. Examples are given in Table [I] 
• Central moments [i n : The central moments of h may be 
constructed from its raw moments, since 




Examples are given in Table [I] 
Some intuition may be obtained by considering an image, 
which in the neighborhood of the point Xq is linear with 
gradient VI(x). The image in the neighborhood of Xq is then 
given as 

I{x) ~ I(x ) + {x- x ) • VJfco), (38) 

and the isophotes near I(xq) are all lines perpendicular to the 
gradient. The image in the neighborhood around Xo is invariant 
w.r.t., smoothing with symmetric and normalized kernels, 
hence a has no influence on the local histograms for small val- 
ues of a. However, the interplay between j3 and a is nontrivial: 
The soft isophotes are constant perpendicular to the gradient. 
Hence, we may consider this a one dimensional problem along 
the axis of the gradient, say x, and I(x) ~ ax + b, where 
a = |W(aso)|, ax = (x — Xq) ■ VI(xq), and b = I(xq). The 
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1 



/? 2 


3/5 4 




1 

L(X)> 



W(x) 



(L(x) 2 +(3 2 ) *W(x) 
{L(x) s +3P 2 L(x)) *W(x) 
\l(x) a + 6p 2 L(x) 2 + 3/3 4 ) * W(aj) 
(L(jc) 5 + 10(3 2 L(x) 3 + 15/3 4 L(£c)) * W(x) 

TABLE I 

Examples of raw and central moments pt! n and /i n of order n, when the Parzen window has central moments rij , j 

a Gaussian of zero mean and variance (3 2 . 



1 



2(Mi) 3 - 
-3K) 4 
4(Mi) 5 - 



/4 

f 6K) 2 ^- 

10(^)3^ + 



10K) 2 



m 4 

~ 5 M>4 " 



^5 



... n, AS DOES 



soft isophote b using Gaussian P is P(ax,f3) = P(x,(3/a), literature. Consider ([27])— ([28]) and let a 00. In that case, 
convolution with a Gaussian integration kernel W(x, a) yields the window hi simplifies as, 



another Gaussian 



P(ax, p) * W(x, a) = P(x, ^f3 2 /a 2 + a 2 ), (39) 



hi(i, x, <3>, a, /3, a) — >> const. / P(/(i/?, 3>, a) — z, /3) 

7q 



(45) 

due to the semi-group properties of Gaussian convolution. For j p(/(^ <& a) — i (3) dtf> 



complicated. J F J n P(I(^ a) - j^) d^ A dj 

(46) 



non-linear images the interplay between a, /?, and a is more a, /?, a) — » ° 

_ . . , , _ . . Choosing 
C. Estimating local densities 

The local density distributions are obtained by normalizing P(h P) = e ^ ^ \ (47) 

t0 unit y' we find that 

Pl (i\x,*,a,l3,a)~ w^'t'"' fl'^L , (40) f f P{lW,*,a) -j,f3)difi Adj = \Sl\y/torf2, (48) 

J r h I (j,x,9,a,fi,a)dj J r J Q 

Pl (i\<S>,a,!3,(T) = j Pl (i\x,*,a,p,tT)dx, (41) and 
where we have assumed (conditional) independence |^|y27r/3 2 

and uniformity such that pi(i,x\&,a, /3,a) = (49) 
jp/(z|a:,*,a,^,(7)/|fi|. The density p R is defined in a Likewise, we have 
similar manner. As fT4| , we extend the concept to the joint 

distributions as follows, Pi Ah .7 a, /3, <r) -> 

r e -(I( X ,3>,*)-i) 2 + (R( X ,a)-j) 2 /(2{3 2 ) fa 

h ItR (i,j,x,*,a,l3,a)= ^ . (50) 

(P(I(x,®,*)-i,f3)P(J(x : *)-j,f3))*W(x,a) : (42) 1 1 ^ 

hi rU j & x a (3 a) * S P rec i se ty me method using a Gaussian kernel 

Pj 5J r(2,.7'|£C, a, /3, cr) ~ — - 1 ' ' — - — ' ' - — , with infinite support |2|. Similar results are obtained for any 
J F 2 i,r{ , , , , P, j integrable Parzen window, P(i,(3). The PW can be interpreted 

as a globally orderless image, as W defining the locality 

Pl , R {i,j\3>,a,p,cj) = 7^7 / p /ifl (i, jlS.as.a^.^daj, extends § loball y- 
1**1 ./n As a side note 



As a side note, since both ( |49| ) and ( pO} obey the diffusion 
(^4) equation w.r.t. /3 2 /2, we may use Green's theorem and write, 



where we also have assumed (conditional) independence 
and uniformity such that Pi,Ft(i, j, x\&, a, /3, cr) = 
Pi,R(iJ\x,&,a,Pa)/\n\. 



Pi(i\^ 2 +P 2 )=Pi(i\M*G(i,p), (51) 
PiAiJ\\/Po+P 2 )=PiAiJ\M*G([iJ} T ,P), (52) 



IV. Theoretical COMPARISON OF PWs AND GPV for fast computation of a range of PW sizes. Further^ -+ 
density estimation m MI for 2D ima g es reduces to - log(Z(W, VR)) I.28J, i.e., 

the angle between the gradients of the images at x, which is 
Locally orderless image is the cornerstone in understanding s i m ii ar t0 Normalized gradient fields proposed in (T6). 



the difference between the PW and GPV density estimators. 

B. GPV is an approximation of Locally orderless images 
A. The PW is a special case of Locally orderless images shortly after the introduction of PW? partial volume (PV) 

The PW, originally proposed along with MI in [2], is a was introduced in (4| and extended to GPV in (5). Unlike PW, 
special case of Locally orderless images, often used in the GPV estimates a global density as a sum of local densities and 



samples the intensity values directly from the image in the 
local neighborhood W. GPV may be derived from the joint 
histograms as follows. First, calculate the joint histogram, 

hi,R{i,j,x,a,P,a) 

P(I(iP, a) - i, p)P(J(iP, a) - j, p)W{x - iP, a) 

(53) 

*P(J(x,a)-j,f3) [ P(I(iJ>,a)-i,f3)W(x-i>,a)d*l> 

(54) 

= P(J(x, a) - j, /?) [P(I(x, a) W(x, a)} (55) 

Then set P to a Boxcar function, 



otherwise 



(56) 



where fj is chosen such that J(i/?, &,a) is mapped into non- 
coinciding isophotes curves. The motivation for this is that 
all isophotes can be evaluated at x simultaneously and can be 
thought of as a 0-order b- spline PW. When integrating over the 
entire domain the GPV scheme is obtained. Thus GPV uses 
small local histograms integrated to form the globally orderless 
image as in the PW approach. This introduces an asymmetry 
for a > in the joint densities making registration results 
inconsistent w.r.t. inversion. This asymmetry has a direct 
influence on the marginal densities giving 3 different estimates 
of the marginal density: estimated from the histogram of a 
single image, or as the integral of either of the two joint 
histograms. I.e., ignoring the scale parameters, the histograms 
of, say, J are given as, 



Kj)= / P(J(x)-j)dx, 
Jn 



(57) 



and the corresponding marginal in the GPV approximation is 
found either as, 

h{j)= [ [ P(J(x) - j)[P(I(x) - i) *W(x)]didx (58) 
Jn Jr 

= [ P(J(x)-j) [ P(I(x)-i)*W(x)didx, (59) 
Jn Jr 



or as 



h\j)= / / P{I(x) -i)[P(J(x) - j) * W(x)]didx 
Jn Jr 

(60) 

= 11 P(I(x) - i) di P(J(x) - j) * W(x) dx. 
Jq Jr 

(61) 

The difference between these three estimates depends on the 
gradient of I(x), and due to the scale of W, the gradient will 
differ for the two estimates based on the joint histograms. 
The asymmetry in GPV causes M(A,B) ^ M(B,A). In 
the limit of a — » 0, and when using identical kernels and 
parameters as Parzen windows for / and J, then GPV is 
symmetric, but unfortunately at the limit differentiability is 
lost and gradient based optimizations schemes have to be 
abandoned. The consequence of the asymmetry in the estimate 
of the joint distribution will be investigated further in the 
following section. 
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Fig. 6. GPV using NMI is asymmetric and has different optima, when 
comparing A4(A,B) and A4(B,A) under a translation along a fixed axis. 
Images compared are (a) two 3-dimensional Gaussian of standard deviation 
5 and 11, (b) baseline and followup of patient number 16 from the OASIS 
collection (2^). 



V. Empirical investigations into the asymmetry in 

GPV 

GPV is asymmetric, i.e., M(A,B) ^ M(B,A), when 
using GPV. The asymmetry has been analyzed in the previous 
section, and in this section we will demonstrate that the 
asymmetry not only has theoretical but also practical impli- 
cations. We start by illustrating the asymmetry of GPV used 



for NMI. Fig. [6(a)] show M(Ao</>, B) and M(B,Ao 4>) for 
two 3-dimensional images of spatial Gaussian with standard 
deviation 5 and 11 and centered in the middle of the images 
of size 256 x 256 x 128. We apply a translational motion, 
0, one image wrt. the other along a fixed axis and due to 
the symmetry of the Gassians, the points of optima are nearly 
identical. However, on real medical images this is not the case: 
In Fig. 6(b)| we have plotted the cost functional M,(Ao(f), B) 
and M(B, Aocf)) for pure translation of two images of baseline 
and followup of patient 16 from the OASIS collection (29). 
The points of optima are clearly different. 

To empirically investigate this obvious asymmetry of GPV 
using NMI, we have constructed two images with a constant 
gradient, same magnitude but different direction for each as 
shown in Fig. [7] We focus on a single isophote, I(x, y) = Io 
and R(x,y) = Ro, extracted using a Boxcar function. These 
are shown in Fig. 7(c) and |7(d)| The value of the joint 
histogram for these intensities (Iq,Rq) is depicted in Fig. [8] as 
a function of space and using various estimation techniques. 
Fig. |8ta) shows the joint histogram's values when comparing 
Fig.^c) to Fig. gd) using GPV, i.e., where I(x,y) = Io 
is smoothed and intersected with R(x, y) = Ro as M(R, I) 
in GPV, and Fig. [8jc) shows the opposite case, M(I,R). 
For reference, in Fig. [SJb) is show the LOI estimate of the 
intersection of isophote Io and Ro. As it can be seen, the 
spatial distribution of intensities is oriented according to the 
non-smoothed isophote. 

Curvature adds further asymmetry, since the intensity moves 
in the direction of the center of the osculating circle, when 
smoothed spatially. Thus, unless the two images curve in the 
exact same manner, the asymmetric smoothing of the GPV 
method will introduce further asymmetry in the similarity 
measure. This is illustrated in Fig. [9| where an isophote is 
first extracted using Boxcar function, then smoothed spatially 



to give the image shown in Fig. 9(a) and this is to be compared 
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R 




(c) (d) 

Fig. 7. Two artificially generated images with same gradient magnitude 
but different directions, (a) and (b) show the images, and (c) and (d) show 
corresponding single isophotes extracted using the same Boxcar function. 




(a) (b) (c) 



Fig. 8. The GPV approximation is asymmetric, (a) is Ai(A, B) and (c) is 
M(B, A), (b) Is what the (a) and (c) are approximating. As the figure show 
the results are quite different due to the asymmetry, and as the geometrical 
differences between the compared isophotes, so does the asymmetry. 

with isophote extracted with as a soft isophote as shown in 
Fig. |9(b)| It can be seen that the images differ especially, where 
isophotes have high curvature. To substantiate this qualitative 
conclusion, we have conducted the following experiment: For 
a fixed image, an image of a given isophote is extracted using 
the 2 different methods, 1) PW as a soft isophote with fixed 
width /3pw = 0.005, and 2) GPV as an isophote extracted 
using a Boxcar with varying width /3gpv followed by spatial 
smoothing with a Gaussian of varying width a. Thus, for a 
fixed image with PW isophote width /3pw, we have searched 
for the values of /3qpv and a such that they minimize the sum 
of squared differences between the two isophote images shown 
in Fig. |9(c)| Notice in particular, the difference between the 
two images of the isophotes is largest near high curvature of 
the original isophote. 

To empirically evaluate the degree of asymmetry as a 
function of a, we have conducted the following experiment: 
For 10 baseline and followup images from [29], we have 
rigidly registered the baseline and followup pair using NMI 
and GPV with a very small a, and then for a range of as 
measuring the spatial asymmetry in the similarity measure 
along the x-axis caused by the increase in a. This is repeated 
for a range of a values. The result is illustrated in Fig. [10] 




(c) (d) 



Fig. 9. The difference between smoothing Boxcar isophotes and soft 
isophotes appear near high isophote curvature, (a) The GPV isophote 
using/^opv = 0.0013, smoothed with W using a = 0.9. (b) The PW isophote 
using /3pw = 0.005. (c) the signed difference of (a) and (b), and (d) the 
absolute difference for a range of a and /?gpv- 

The experiment reveals that smoothing of the image does 
not eliminate the problem, and as our investigations show, 
asymmetry persist over all image scales. The asymmetry can 
also be observed in the joint density estimates. In Fig. [TT 
is shown the difference between the joint density used to 
evaluate M(B,A) and M(B, A) for 2 different values of a. 
The difference is seen to be non-negligible for both scales, 
and thus cannot be ignored. 

To summarize, the GPV is asymmetric, and the degree 
of asymmetry increases proportionally to curvature of the 
isophotes as well as to a. The asymmetry cannot be alleviated 
using image smoothing, and we conclude that GPV does not 
offer inverse consistent registration. 

VI. Empirical comparison of PW and GPV by 
Scales 

In the following and using NMI, we will empirically eval- 
uate and compare PW and GPV in terms of scales, i.e., the 
influences of the different kernels on the similarity measure, 
NMI, and the estimated joint density distribution to give 
intuition about the influence of different scales on NMI. Two 
types of algorithms for GPV and PW have been implemented: 
A fast cubic uniform B- spline approach (hereafter referred 
to as B- spline), which is described and analyzed in the next 
section, and a version based on Gaussian kernels. For direct 
comparison of B-splines and Gaussians we have estimated 
the variance of a B- spline to be a w 0.6. This allows us 
to investigate the effect of tuning the standard deviations of 
each of the kernels for both PW and GPV. We note here 
that some computational restrictions imposed on GPV due to 
computational complexity, thus a Gaussian with local support 
has been used, i.e., very small values are truncated. We have 
performed intra subject registration using rigid registration on 
a series of Tl weighted MRI of the human brain from different 
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Fig. 10. GPV using NMI gives inconsistent optimization results for a simple, artificial translation, and the inconsistency depends linearly on a but not on 
cr. For each boxplot, the circles represent individual measurement with slight noise added in the horizontal direction for legibility, the black line denotes the 
mean, the dark and light gray areas denote the 50% and 75% fractiles. 
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Fig. 11. The asymmetry of GPV in the estimated densities. We have 
subtracted the joint density distribution estimated in M(A, B) from the 
estimated density distribution in A4(B, A), at 2 different image scales with 
a = 1, 4 and a = 0.2. The Jensen-Shannon divergence is 0.10005 (a) and 
0.27105 for (b) 



subjects (29) . For each subject we registered a followup to the 
baseline, such that the pair the two volumes are aligned close 
to optimally (within 0.5 voxel). For a given direction (x-axis) 
we have translated one of the two with +/ — 1.5 voxels in steps 
of 0.1 voxel and calculated the NMI similarity. This has been 
repeated for a wide range of kernels in the different spaces, 
i.e., different a, /?, and a including our fast B-spline based 
algorithm for 10 different subjects. 
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Fig. 12. The effect of image smoothing on the objective function (NMI) 
using the different density estimation schemes: (a) The PW using a Gaussian 
kernel f3 = 0.6, (b) PW using cubic b-spline, and (c) GPV using a Gaussian 
a = 0.6, (d) GPV using cubic B-spline. 



A. Spatial scale, a 

When registering images, most algorithms exploit the scale 
space of the images by smoothing of the image with the 
kernel K. The idea is to capture large scale structures of the 
images to get closer to the optima before switching scale in 
order to capture structure at a finer scale. The actual influence 
on the different similarity measures has only been vaguely 
investigated in the literature. In spite of this uncertainty, 
smoothing the images is an often used technique, and it has 
been empirically shown to yield good results, e.g., in (30). 
We have examined the effect of image smoothing on NMI, 



and the results can be seen in Fig. 12 for PW and GPV 



respectively. Furthermore, Fig. [13] shows the estimated joint 
probability distribution for both PW and GPV. As can be seen, 
the distribution is more concentrated in a smaller area and 
NMI increases, when cr is large. The figures indicate that PW 
in general has a more pronounced peak than GPV for NMI, 
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Fig. 13. The effect of image smoothing on the joint density using different 
estimation schemes: (a) & (b) The PW using (3 = 0.3, and (a) a = 0.5 and 
(b) a = 2; (c) & (d) GPV using a = 0.6„ p = 0.3 and (c) a = 0.5 and (d) 
a = 2. 
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Fig. 15. The effect of image smoothing with PW on the joint density 
estimate: The PW using a = 1 and (a) (3 = 0.5 and (b) (3 = 2. 
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Fig. 14. The effect of varying /3 for PW and NMI. 

and that the optima is not shifted much over scales for this 
particular set of Tl -weighted MRI of brains and using NMI. 

B. Intensity scale, j3 

The intensity scale controls the resolution in intensity 
domain, and as PW is a smoothing kernel in the intensity 
domain, then entropy is increased (3T| proportionally to f3. 
The smoothing disperses the densities within the joint density, 
thus decreases the overall NMI scores, as can be seen in 



Fig. [14 
in Fig 




[The effect of f3 on the joint density is illustrated 
131 As expected, the joint histogram becomes more 
smooth as f3 is increased.The consequence of increasing f3 
is that small scale changes in the image become neglectable 
(see Section |III-B| ), whereas large changes are preserved, i.e., 
putting more emphasis on large gradients with increasing f3. 
We have not included GPV in this experiment, however, GPV 
also has an intensity scale i.e. the width of its Boxcar function. 

C. Integration scale, a 

PW is the special case of LOI, where a — )> oo, thus a 
global density estimate, whereas GPV is an integration of local 



Fig. 16. The effect of varying a on the NMI functional using GPV with 
a = 0.2. 



densities to become global. GPV uses a Boxcar function for 



P and smoothes the isophotes with W, illustrated in Fig. |8(a) 
and 1 8(c) | The effect of varying a on NMI using GPV is shown 
in Fig. |l6| It is seen that NMI decreases and becomes more 
dispersed as a is increased. Comparing with Fig. 



14 



we note 

that the effect of a on GPV is similar to the effect of f3 on 
PW: it reduces the function value due to the dispersion effect. 
Our theoretical investigation has revealed that smoothing is 
performed asymmetrically for GPV, and this is illustrated in 
Fig. [17] where we see horizontal dispersion but no vertical 
dispersion especially visible in the upper left corner. Previous 
empirical investigations | 6] used the same B -spline kernel as 
PW fj and partial volume a and reported that PW is more 
precise, and that GPV has a larger convergence radius. From 
our experiments it is obvious that this difference is merely a 
consequence of the additional smoothing introduced by W as 
discussed in Section |III-B| This is supported by Fig. [l2j As 



can be seen the PW is significantly more peaked than the GPV, 




(a) GPV 



(b) GPV 



Fig. 17. The effect of the integration scale on the joint density estimate for 
GPV and NMI using a = 1: (a) a = 0.5 and (b) a = 2. 



which appears superficially to be a smoothed version of PW. 

The kernel W can be used to describe local density es- 
timates such as local MI or NMI [14], where each local 
histogram has its own NMI functional as in ([4]). 

D. Comparing GPV and PW by Scales 

The main difference between GPV and PW is: (1) the 
explicit modelling of the intensity coherence in PW with a 
Gaussian, thus assumes smooth images, where GPV has a 
disjoint view on the isophotes, i.e., no intensity coherence. (2) 
GPV views the template image intensities as a set of classes 
and the target as a continuous image, whereas PW views both 
images, template and target, as continuous. 

VII. Fast Implementations 

We use a quasi-Newton gradient descent algorithm for 
optimizing ([I]). This results in a very fast and general algorithm 
that with only a few changes works for many different loss- 
functions. 

In order to use quasi-Newton methods for optimization, 
we need to derive the gradient of ([TJ w.r.t. the parameters 
of the uniform cubic B-spline, <I>. We use the notation of 
differentials, dg(x) = Dg(x) dx, where D is the partial 
derivative operator. Note that dx is a vector of differentials, not 
the wedge product of its elements as for integration. Further, 
we will only write up non-zero terms that depend on cZ3>. The 
differential of ([T]) is, 



dS = dM + dS, 



(62) 



where arguments have been omitted for brevity. Ignoring 
the regularization term we focus on the differential of the 
similarity measures. For ([5]), the differential is found to be, 

dMun = / F(i, j)dh IyJ di A dj, (63) 

under the mild Leibnitz integration rule, and where 

dh = d(P(I(x) - i)P(J(x) - j) * W(x)) (64) 
= (DP(I(x) - i)dl)P(J(x) - j) * W(x), (65) 

avoiding irrelevant argument for brevity. In contrast, the dif- 
ferential of ^ is dM\m = J Q DF(x,I(x),J(x))dI(x)dx, 
where smoothness typically is imposed on F and/or /. In 
comparison, our formulation §5§ naturally allows for the added 
smoothing in intensity and integration spaces and replaces 
technical difficulties in evaluating DF with Dh. One advan- 
tage is thus that it becomes easier to compare loss-functions 
directly. For ([?]) the differential is found to be, 



dM 



DF(x, h It j(x, i,j)) dh It j(x, i, j) dx A di A dj, 



(66) 



similarly under the mild Leibnitz integration rule. As shown 
in Section |VII| the form of ( [66] ) suggests only a slight 
computational overhead as compared to ( [63] ). The derivatives 
for a range of F's are given in (32). 



Using Leibniz integration rule, the differentials of the dis- 
tributions are given as 



dp/(i,*) = 
dp T (i\x,&) 



1 

W\ 



dpi(i\x, 3>) dx, 



dhi(i, x, <E>) 
" f r hi(j,x,&)dj 
h T (i,x^) J r dh!(j,x^)dj 

U T hi(j,x,®)dj) 2 



(67) 



(68) 



d/ij(z, x, = (dP(I(x, o) - i, P) * W(x, a)) , (69) 

where irrelevant arguments have been omitted for brevity. 
Likewise, we have: 



dpiM^j) = p| J^dp IyR (iJ\x)dx, 
dh IiR (i,j,x) 



dpiMhj\ x ) 



/ r 2 h>i,R(k, Z, x) dk A dl 



hi,R(h J: x ) Jr 2 dhi 7 u{k, Z, x) dh A dl 



^2 ' 



(70) 



(71) 



(/ r 2 h>i,R(k, Z, x) dk A dl)' 
dh IiR (i,j,x) = 

(dP(I(il>, *, a) - i, (3) P(J(il>, a) - j, /?)) * W(x - a). 

(72) 

In the context of Locally orderless images, GPV can be derived 
as follows: 

dhj = d (P(I(x, a) - i, P) * W(x, a)) (73) 
= P(I(x, a) - i, p) * (D x W(x, a)) , (74) 

and the differential w.r.t. x is found to be, 

dh IiR (i,j,x,a,P,a) 

= P(J(x, a) - j, P) ((P(I(4>(x),a) - i, P)) 



*(D x W(x,a)) 



(75) 



In Fig. [18] is shown the pseudo code for sum of squared 
differences using a spatial integration (SSD), Parzen window 
approximation of the general sum of p-norms (PNORM), 
Parzen window and Generalized partial volume approximation 
of normalized mutual information (PW and GPV). Binary 
code interfacing to Matlab is available (33). All kernels used 
in our implementation are 3rd order uniform B- splines as 
well as Boxcar functions in order to reduce computational 
complexity. The code assumes 3D images, cubic B- splines 
for all kernels, and M bins in the histograms. We assume 
that today's processors have equal processing time of, e.g., 



sum, log, sin etc. From the pseudo code in Fig. [18] and the 
annotated computational complexity, we see that PW and GPV 
have almost identical computational complexity. Results by 
actual implementations may vary, but in general the computing 
times for NMI using either GPV or PW are comparable 
in computational complexity to SSD using B -splines. W.r.t. 
memory, GPV requires 192 x N x 8 bytes of memory to obtain 
the speed, where the PW only requires 8 x TV x 8 bytes (on 
64-bit, double precision). 
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# Given 2 images , I and R, and the determinant of the 

# transformation , det , as a function of space , 

# calculate PW for NMI and PNorm, GPV for NMI and 

# SSD, based on N image evaluation points , and 

# M marginal and M A 2 joint histogram bins. Flops are 

# based on cubic B— splines 

FOR N evaluation points 

calculate image spline coeff. 
(60 flops) 

IF (SSD | | PW || PNorm) 

calculate derivative of image spline coeff. 
(48 flops) 

FOR 64 combinations of image spline coeff. 
IF (SSD | | PW || PNorm) 

update image at evaluation point 
(4 flops) 

update image gradient at evaluation point 
(12 flops) 
IF (GPV) 

update histograms 
(4 flops) 
IF (SSD) 

update residual 
(2 flops) 
IF(PW | | PNorm) 

calculate histogram spline coeff. 
(20 flops) 

FOR 16 histogram spline coeff. 
IF (PNorm) 

compute P— norm 

update residual 

update derivative 

(5 flops) 
ELSE 

update histograms 
(2 flops) 
IF(PW || GPV) 

calculate NMI and derivative on histograms 
(9*M~2+6M flops) 
FOR N evaluation points 
IF ( GPV) 

calculate derivative of image spline coeff. 
(48 flops) 

FOR 64 combinations of image spline coeff. 
update derivative of histogram 
(16 flops) 
IF (PW) 

FOR 16 histogram spline coeff. 
update derivative of histogram 
(9 flops) 
update derivatives 
(3 flops) 

# Total flop usage : 

# SSD: 1134N flops 

# PW: 1331N +9JVT2 +6M flops 

# PNorm: 1379N flops 

# GPV: 1383N +91VT2 +6M flops 



Fig. 18. Pseudo code for SSD, NMI using PW and GPV and P-Norm using 
PW. 



# samples 


Similarity measure 


SSD 


PW 


1000000 


Avg. execution time (in sec) 


1.21 


1.63 


Relative exec, time to SSD 


1 


1.34 


Theoretic relative exec, time to SSD 


1 


1.17 




Overhead 


1 


1.13 



TABLE II 

The table shows the average execution time across 100 

FUNCTION EVALUATIONS OF SSD VS PW-NMI FOR 1000000 POINTS 
USING 256 BINS. 



To substantiate our theoretical computation we have per- 
formed some empirical experiments. First we note that the 
overhead of GPV and PW is in general small. The histogram 
calculations will only dominate in the special case of a small 
number of samples and many histogram bins. We have com- 
pared computational complexity empirically for PW and GPV 
registration and SSD. We use cubic B-spline for K, P, and 
W, and histograms with 256 bins for marginal histograms and 
256 2 for the joint histograms. We perform the computations on 
a laptop with i7-core Q820 (Quad-core) operating at 1.7 GHz 
and 12 GB shared memory. All similarity measures have been 
implemented in parallel using the Intel Threading Building 
Blocks library. As the code runs multi-threaded we believe 
that most of the 13% overhead seen in Table H comes from 
the threads, which are initialized twice as many times in PW 
as in SSD. Furthermore, thread blocking can cause further 
latency during histogram update, thus the estimated times for 
single threaded implementation are very close to our estimate 
for large N. 

VIII. Conclusion 

We have introduced Locally Orderless Registration, a frame- 
work that encompasses most of the currently used similarity 
measures. Our framework allow us to divide a wide range 
of similarity measures into 3 categories from simple global 
linear measures such as the P-norm or Huber norm over 
non-linear global measures such as Correlation Coefficient, 
Mutual Information and Normalized Mutual Information to 
position dependent schemes, e.g., Correlation Ratio and spa- 
tially encoded Mutual Information. All of these measures or 
any combination are formulated in a scale space over measure- 
ment, intensity, and integration space offering the flexibility 
to easily create application specific similarity measures in a 
smooth formulation well suited for gradient based schemes. 
We have presented a thorough analysis of the scales in the 
different spaces both theoretically through the moments of 
the density distribution and a simple local image model and 
through rigorous empirically experiments. 

We have extended our previous work IT) on the difference 
between Parzen Window and Generalized Partial Volume. 
Our analysis clearly shows that Generalized Partial Volume 
is an asymmetric density estimator not suited for problems 
that require inverse consistency. Depending on the smoothing, 
we have shown that this error can become larger than a 
single voxel. Thus, Generalized Partial Volume achieves its 
computational speed by making an approximation to the local 
histogram and by using 0-order B-spline as Parzen estimator. 
In (6) it is reported that Parzen Window is more accurate than 
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Generalized Partial Volume for kernels W with a > 0, and we 
show that this is due to the difference in smoothing and not 
due to the properties of the two density estimators. And worse, 
Generalized Partial Volume measures the dissimilarity of the 
images at two different scales, and thus the effect becomes 
more pronounced with increased a - histograms of larger 
areas. 

Our theoretical analysis of the computational complexity 
and the memory requirements demonstrate that the Parzen 
window is more attractive for intensity based registration. 

We believe that the choice of density estimator should be 
based on the particular application. If no intensity context 
exist, e.g., registering classes (Correlation Ratio) Generalized 
Partial Volume is to be preferred over Parzen Window due 
to the fact that coherence in the joint and marginal density 
distributions is a key assumption in Parzen Window. However, 
if intensity images are to be registered, and computational 
efficiency or inverse consistency is a desired property, then our 
analysis reveals that Parzen Window is a far more attractive 
density estimator as compared to Generalized Partial Volume. 
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